Group Task

Mencía Gómez Luna (DNI: 05468738-M), Laura Toro Iglesias (DNI: 06591204-W), Alba García Vega (DNI: 71904192M), Julia Martyniewicz(DNI: CAJ641749)

Spanish Elections!

Analyzing election data in Spain 2008-2019

Animated GIF

Required packages

rm(list = ls())
library(tidyverse)
library(dplyr)
library(stringi)
library(lubridate)
library(stringr)
library(sf)
library(ggplot2)
library(forcats)
library(tidyr)
library(patchwork)
library (glue)

Data

The practice will be based on the electoral data archives below, compiling data on elections to the Spanish Congress of Deputies from 2008 to the present, as well as surveys, municipalities codes and abbreviations.

Code
election_data <- read_csv(file = "./data/datos_elecciones_brutos.csv")
cod_mun <- read_csv(file = "./data/cod_mun.csv")
surveys <- read_csv(file = "./data/historical_surveys.csv")
abbrev <- read_csv(file = "./data/siglas.csv")

DATA PREPARATION

First, we convert the data to tidy data: In both “election_data” and “surveys” databases, each party appears as a variable; so we create a new variable that unites all these parties so they appear as categories of a single variable “party”, and the values appear in a new variable called “ballots” and “estimated_voting” respectively.

Code
surveys_tidy <- surveys |>
  pivot_longer(
    cols = 11:59,
    names_to = "party",      
    values_to = "estimated_voting"    
  )

election_data_tidy <- election_data |>
  pivot_longer(
    cols = 16:471,
    names_to = "party",      
    values_to = "ballots"
  )

Now we eliminate the non relevant parties (according to points 6 and 7) for “surveys_tidy” and “election_data_tidy”

For the “survey” database:

Code
important_parties <- c(
  "PSOE", "PP", "CS", "BNG", "EAJ-PNV", "CIU", 
  "PODEMOS", "IU", "UP", "EH-BILDU", "ERC", "VOX", "EE", "EA",
  "EH", "PCE", "MP"
)

surveys_tidy1 <- surveys_tidy |>
  mutate(party = case_when(
    party %in% important_parties ~ party,  # keep important parties
    TRUE ~ "Other"    # new category: other parties
  )) |>
  
  #Change some party names
  mutate(party = case_when(
    party == "PODEMOS" ~ "UNIDAS PODEMOS-IU",
    party == "IU" ~ "UNIDAS PODEMOS-IU",
    party == "UP" ~ "UNIDAS PODEMOS-IU",
    party == "PCE" ~ "PODEMOS-IU",
    party == "MUC" ~ "PODEMOS-IU", # party integrated into IU but minority
    party == "ICV" ~ "PODEMOS-IU", # Verdes in Valencia  go with "Podemos"
    party == "EE" ~ "PSOE", # PSOE in País Vasco
    party == "EA" ~ "EH-BILDU",
    party == "EH" ~ "EH-BILDU",
    party == "COMPROMIS" ~ "PODEMOS-IU",
    party == "JXCAT-JUNTS" ~ "CIU",
    party == "PDECAT" ~ "CIU",
    TRUE ~ party  # keep the rest without changes
  ))

We check which parties we have now in the first surveys_tidy

Code
unique(surveys_tidy1$party)
 [1] "Other"             "PSOE"              "PODEMOS-IU"       
 [4] "CIU"               "EAJ-PNV"           "ERC"              
 [7] "UNIDAS PODEMOS-IU" "EH-BILDU"          "PP"               
[10] "BNG"               "CS"                "VOX"              
[13] "MP"               

We do the same for the “election_data_tidy” database.

In this case we don’t do it manually as before since each party has many branches, federations and coalitions. That is why we use the grepl() code. We also make sure that it groups all the parties regardless of capital letters, accents or hyphens.

Additional notes:

  • Since we didn’t add Junts for Convergencia i Unio (as it wasn’t in the instructions), and the party dissolved in 2015, they don’t get votes from 2015 onwards.

  • We noticed that in election data, Más País only has votes for the november 2019 elections, that’s why it doesn’t appear in other elections after we clean the data.

Code
important_parties2 <- c("PODEMOS", "ARALAR", "EUSKO ALKARTASUNA", "EN MAREA", "EQUO",  "ESQUERRA REPUBLICANA") # exceptions that we add manually

election_data_tidy1 <- election_data_tidy |>
  mutate(party = case_when(
    party %in% important_parties2 ~ party,            
    grepl("PARTIDO NACIONALISTA VASCO", party, ignore.case = TRUE) ~ "PARTIDO NACIONALISTA VASCO",
    grepl("BLOQUE NACIONALISTA GALEGO", party, ignore.case = TRUE) ~ "BLOQUE NACIONALISTA GALEGO",
    grepl("VOX", party, ignore.case = TRUE) ~ "VOX",
    grepl("ESQUERRA REPUBLICANA DE CATALUNYA", party, ignore.case = TRUE) ~ "ESQUERRA REPUBLICANA DE CATALUNYA",
    grepl("CONVERGENCIA i UNIO", stri_trans_general(party, "Latin-ASCII"), ignore.case = TRUE) ~ "CONVERGÈNCIA i UNIÓ",
    grepl("MAS PAIS", stri_trans_general(party, "Latin-ASCII"), ignore.case = TRUE) ~ "MÁS PAÍS",
    grepl("PODEMOS|UNIDAS PODEMOS|UNIDOS PODEMOS|IZQUIERDA UNIDA|UNITS PODEM|XUNIES PODEMOS", stri_trans_general(party, "Latin-ASCII"), ignore.case = TRUE) ~ "PODEMOS-IU", 
    grepl("PARTIDO POPULAR|PARTIT POPULAR|PP", party, ignore.case = TRUE) ~ "PARTIDO POPULAR",
    grepl("PARTIDO SOCIALISTA|PSOE", party, ignore.case = TRUE) ~ "PARTIDO SOCIALISTA OBRERO ESPAÑOL",
    grepl("PARTIDO DE LA CIUDADANIA", stri_trans_general(party, "Latin-ASCII"), ignore.case = TRUE) ~ "CIUDADANOS PARTIDO DE LA CIUDADANÍA",
    grepl ("EUSKAL HERRIA BILDU", party, ignore.case = TRUE) ~ "EUSKAL HERRIA BILDU",
    TRUE ~ "Other"       
  ))

election_data_tidy2 <- election_data_tidy1 |> 
  mutate(party = case_when(
    party == "PODEMOS" ~ "PODEMOS-IU",
    party == "EN MAREA" ~ "PODEMOS-IU",
    party == "EQUO" ~ "PODEMOS-IU",
    party == "EUSKAL HERRIA BILDU (EH Bildu)" ~ "EH BILDU",
    party == "EUSKAL HERRIA BILDU" ~ "EH BILDU",
    party == "ESQUERRA REPUBLICANA" ~ "ESQUERRA REPUBLICANA DE CATALUNYA",
    party == "ARALAR" ~ "EH BILDU",
    party == "EUSKO ALKARTASUNA" ~ "EH BILDU",
    TRUE ~ party  # keep the rest without changes
  ))

unique(election_data_tidy2$party) # to check the results 
 [1] "Other"                               "EH BILDU"                           
 [3] "PARTIDO POPULAR"                     "PARTIDO SOCIALISTA OBRERO ESPAÑOL"  
 [5] "PARTIDO NACIONALISTA VASCO"          "CIUDADANOS PARTIDO DE LA CIUDADANÍA"
 [7] "PODEMOS-IU"                          "CONVERGÈNCIA i UNIÓ"                
 [9] "ESQUERRA REPUBLICANA DE CATALUNYA"   "BLOQUE NACIONALISTA GALEGO"         
[11] "VOX"                                 "MÁS PAÍS"                           

Animated GIF

We eliminate those columns that don’t provide extra information where there is no variance. We apply this process to the “election” and “survey” data.

Code
election_data_tidy3 <- election_data_tidy2 |>
  select(-tipo_eleccion, -vuelta, -codigo_distrito_electoral)

Also, according to point 11, we have to discard some polls, so we filter:

Code
surveys_tidy2 <- surveys_tidy1 |>
  mutate(
    date_elec = ymd(date_elec),
    field_date_from = ymd(field_date_from),
    field_date_to = ymd(field_date_to),
    fieldwork_days = as.numeric(difftime(field_date_to, field_date_from, units = "days")) # we calculate the fieldwork days
  ) |>
  filter(
    date_elec >= "2008-01-01",    # elections >= 2008
    exit_poll == "FALSE",         # no exit polls
    size >= 750,                 # size >= 750  and known
    fieldwork_days > 1            # fieldwork days > 1 
    ) |>
  select (-type_survey) # discard unimportant columns

JOINS:

Possible joins that may be useful:

  1. Joining cod_mun and election_data_tidy2 to have the full name of municipalities (will be useful for question 1). First we create a new variable in election_data_tidy2 that is the full code of the municipality, using the columns “codigo_ccaa”, “codigo_provincia” and “codigo_municipio”, and then we join:
Code
election_data_tidy_new <- election_data_tidy3 |> 
  mutate(cod_mun_full = paste(codigo_ccaa, codigo_provincia, codigo_municipio, sep = "-"))

#the same with the "glue" library: 

#election_data_tidy_new <- election_data_tidy3 |>
  #mutate(cod_mun_full = glue("{codigo_ccaa}-{codigo_provincia}-{codigo_municipio}"))

# left join to have the names of municipalities

election_data_tidy_new <- election_data_tidy_new |> 
  left_join(cod_mun, by = c("cod_mun_full" = "cod_mun"))

Now we have a table with the info from election_data and cod_mun.

  1. Join abbreviations to election and survey data:

First, we have to clean the names from “abbrev” database:

Code
abbrev_clean <- abbrev |>
  mutate(denominacion = case_when(
    denominacion %in% important_parties2 ~ denominacion,  # same parties as before          
     grepl("PARTIDO NACIONALISTA VASCO", denominacion, ignore.case = TRUE) ~ "PARTIDO NACIONALISTA VASCO",
    grepl("BLOQUE NACIONALISTA GALEGO", denominacion, ignore.case = TRUE) ~ "BLOQUE NACIONALISTA GALEGO",
    grepl("VOX", denominacion, ignore.case = TRUE) ~ "VOX",
    grepl("ESQUERRA REPUBLICANA DE CATALUNYA", denominacion, ignore.case = TRUE) ~ "ESQUERRA REPUBLICANA DE CATALUNYA",
    grepl("CONVERGENCIA i UNIO", stri_trans_general(denominacion, "Latin-ASCII"), ignore.case = TRUE) ~ "CONVERGÈNCIA i UNIÓ",
    grepl("MAS PAIS", stri_trans_general(denominacion, "Latin-ASCII"), ignore.case = TRUE) ~ "MÁS PAÍS",
    grepl("PODEMOS|UNIDAS PODEMOS|UNIDOS PODEMOS|IZQUIERDA UNIDA|UNITS PODEM|XUNIES PODEMOS", stri_trans_general(denominacion, "Latin-ASCII"), ignore.case = TRUE) ~ "PODEMOS-IU", 
    grepl("PARTIDO POPULAR|PARTIT POPULAR|PP", denominacion, ignore.case = TRUE) ~ "PARTIDO POPULAR",
    grepl("PARTIDO SOCIALISTA|PSOE", denominacion, ignore.case = TRUE) ~ "PARTIDO SOCIALISTA OBRERO ESPAÑOL",
    grepl("PARTIDO DE LA CIUDADANIA", stri_trans_general(denominacion, "Latin-ASCII"), ignore.case = TRUE) ~ "CIUDADANOS PARTIDO DE LA CIUDADANÍA",
    grepl ("EUSKAL HERRIA BILDU", denominacion, ignore.case = TRUE) ~ "EUSKAL HERRIA BILDU",
    TRUE ~ "Other"       
  ))

abbrev_clean2 <- abbrev_clean |> 
  mutate(denominacion = case_when(
    denominacion == "PODEMOS" ~ "PODEMOS-IU",
    denominacion == "EN MAREA" ~ "PODEMOS-IU",
    denominacion == "EQUO" ~ "PODEMOS-IU",
    denominacion == "EUSKAL HERRIA BILDU (EH Bildu)" ~ "EH BILDU",
    denominacion == "EUSKAL HERRIA BILDU" ~ "EH BILDU",
    denominacion == "ESQUERRA REPUBLICANA" ~ "ESQUERRA REPUBLICANA DE CATALUNYA",
    denominacion == "ARALAR" ~ "EH BILDU",
    denominacion == "EUSKO ALKARTASUNA" ~ "EH BILDU",
    TRUE ~ denominacion  # keep the rest without changes
  ))

unique(abbrev_clean2$denominacion) #To check the results
 [1] "Other"                               "PARTIDO NACIONALISTA VASCO"         
 [3] "PODEMOS-IU"                          "EH BILDU"                           
 [5] "PARTIDO POPULAR"                     "PARTIDO SOCIALISTA OBRERO ESPAÑOL"  
 [7] "ESQUERRA REPUBLICANA DE CATALUNYA"   "CONVERGÈNCIA i UNIÓ"                
 [9] "BLOQUE NACIONALISTA GALEGO"          "CIUDADANOS PARTIDO DE LA CIUDADANÍA"
[11] "VOX"                                 "MÁS PAÍS"                           
Code
# change siglas

abbrev_clean2 <- abbrev_clean2  |> 
  mutate(siglas = case_when(
    denominacion == "PARTIDO SOCIALISTA OBRERO ESPAÑOL" ~ "PSOE",
    denominacion == "PARTIDO POPULAR" ~ "PP",
    denominacion == "PODEMOS-IU" ~ "PODEMOS-IU",
    denominacion == "EH BILDU" ~ "EH-BILDU",
    denominacion == "ESQUERRA REPUBLICANA DE CATALUNYA" ~ "ERC",
    denominacion == "BLOQUE NACIONALISTA GALEGO" ~ "BNG",
    denominacion == "VOX" ~ "VOX",
    denominacion == "MÁS PAÍS" ~ "MP",
    denominacion == "PARTIDO NACIONALISTA VASCO" ~ "EAJ-PNV",
    denominacion == "CIUDADANOS PARTIDO DE LA CIUDADANÍA" ~ "CS",
    denominacion == "Other" ~ "Other", 
    denominacion == "CONVERGÈNCIA i UNIÓ" ~ "CIU",
    TRUE ~ siglas
  ))

unique(abbrev_clean2$siglas)
 [1] "Other"      "EAJ-PNV"    "PODEMOS-IU" "EH-BILDU"   "PP"        
 [6] "PSOE"       "ERC"        "CIU"        "BNG"        "CS"        
[11] "VOX"        "MP"        
Code
# remove duplicates for the join
abbrev_clean2_unique <- abbrev_clean2 |> 
  distinct(denominacion, .keep_all = TRUE)

Then we do the join, which we can do both with the survey and the election data, so we have a more complete databases. We also reorder and rename some of the new variables so it’s clearer:

Code
# election data
final_election_data <- election_data_tidy_new |> 
  left_join(abbrev_clean2_unique, by = c("party" = "denominacion"))

final_election_data <- final_election_data |> 
  relocate(siglas, .after = party)


# survey data
final_survey_data <- surveys_tidy2 |> 
  left_join(abbrev_clean2_unique, by = c("party" = "siglas"))

final_survey_data <- final_survey_data |> 
  relocate(denominacion, .before = party) |> 
  rename(
    abbrev = party,
    party = denominacion
  )

final_survey_data <- final_survey_data |> 
  mutate(party = if_else(is.na(party), "Other", party))

Animated GIF

QUESTIONS

Question 1

Which party was the winner in the municipalities with more than 100,000 inhabitants (census) in each of the elections?

Code
# we filter the large municipalities
large_municipalities <- final_election_data |> 
  filter(censo > 100000)

# we sum up the total votes grouping by municipality, party, year and month
votes_by_party <- large_municipalities |> 
  group_by(municipio, party, anno, mes) |> 
  summarise(total_votes = sum(ballots, na.rm = TRUE), .groups = "drop")

# we check the winners by municipality
winners_by_municipality <- votes_by_party |> 
  group_by(municipio, anno, mes) |> 
  filter(total_votes == max(total_votes)) |> 
  ungroup()

print(winners_by_municipality) 
# A tibble: 282 × 5
   municipio         party                              anno mes   total_votes
   <chr>             <chr>                             <dbl> <chr>       <dbl>
 1 Albacete          PARTIDO POPULAR                    2008 03          49909
 2 Albacete          PARTIDO POPULAR                    2011 11          55858
 3 Albacete          PARTIDO POPULAR                    2015 12          36149
 4 Albacete          PARTIDO POPULAR                    2016 06          38470
 5 Albacete          PARTIDO SOCIALISTA OBRERO ESPAÑOL  2019 04          28729
 6 Albacete          PARTIDO SOCIALISTA OBRERO ESPAÑOL  2019 11          27074
 7 Alcalá de Henares PARTIDO POPULAR                    2008 03          48087
 8 Alcalá de Henares PARTIDO POPULAR                    2011 11          46667
 9 Alcalá de Henares PARTIDO POPULAR                    2015 12          30927
10 Alcalá de Henares PARTIDO POPULAR                    2016 06          33949
# ℹ 272 more rows

Question 2

Which party was the second when the first was the PSOE? And when the first was the PP?

When PSOE was first:

Code
# first we do the same as Q1 but for all municipalities 

votes_by_party_all <- final_election_data |> 
  group_by(municipio, siglas, anno, mes) |> 
  summarise(total_votes = sum(ballots, na.rm = TRUE), .groups = "drop")

winners_by_municipality_all <- votes_by_party_all |> 
  group_by(municipio, anno, mes) |> 
  filter(total_votes == max(total_votes)) |> 
  ungroup()

# we extract the second when PSOE was first

psoe_winner <- winners_by_municipality_all |> filter(siglas == "PSOE")

second_place_psoe_first <- votes_by_party_all |> filter(municipio %in% psoe_winner$municipio) |> 
  filter(!(siglas == "PSOE")) |> 
  group_by(municipio, anno, mes) |> 
  filter(total_votes == max(total_votes)) |> 
  ungroup()

results_psoe_winner <- psoe_winner |>
  rename(winner_party = siglas, winner_votes = total_votes) |> 
  left_join(
    second_place_psoe_first |> 
      rename(second_party = siglas, second_votes = total_votes),
    by = c("municipio", "anno", "mes")
  ) |> 
  relocate(anno, mes, .after = municipio)

results_psoe_winner # we see the results in column "second_party"
# A tibble: 14,113 × 7
   municipio  anno mes   winner_party winner_votes second_party second_votes
   <chr>     <dbl> <chr> <chr>               <dbl> <chr>               <dbl>
 1 Ababuj     2019 04    PSOE                   13 PP                     13
 2 Abadiño    2008 03    PSOE                 1243 EAJ-PNV              1050
 3 Abadía     2008 03    PSOE                  148 PP                     72
 4 Abadía     2011 11    PSOE                  126 PP                     89
 5 Abadía     2015 12    PSOE                  114 PP                     60
 6 Abadía     2016 06    PSOE                  111 PP                     68
 7 Abadía     2019 04    PSOE                  128 PP                     34
 8 Abadía     2019 11    PSOE                  112 PP                     31
 9 Abadía     2019 11    PSOE                  112 VOX                    31
10 Abajas     2008 03    PSOE                   14 PP                      9
# ℹ 14,103 more rows

When PP was first:

Code
# we extract the second when PP was first in the same way as before
pp_winner <- winners_by_municipality_all |> filter(siglas == "PP")

second_place_pp_first <- votes_by_party_all |> filter(municipio %in% pp_winner$municipio) |> 
  filter(!(siglas == "PP")) |> 
  group_by(municipio, anno, mes) |> 
  filter(total_votes == max(total_votes)) |> 
  ungroup()

results_pp_winner <- pp_winner |> rename(winner_party = siglas, winner_votes = total_votes) |> 
  left_join(
    second_place_pp_first |> 
      rename(second_party = siglas, second_votes = total_votes),
    by = c("municipio", "anno", "mes")
  ) |> 
  relocate(anno, mes, .after = municipio)

results_pp_winner #Results in column "Second party"
# A tibble: 26,531 × 7
   municipio  anno mes   winner_party winner_votes second_party second_votes
   <chr>     <dbl> <chr> <chr>               <dbl> <chr>               <dbl>
 1 Ababuj     2008 03    PP                     27 PSOE                   25
 2 Ababuj     2011 11    PP                     28 PSOE                   10
 3 Ababuj     2015 12    PP                     22 PODEMOS-IU             14
 4 Ababuj     2016 06    PP                     26 PODEMOS-IU             10
 5 Ababuj     2019 04    PP                     13 PSOE                   13
 6 Ababuj     2019 11    PP                     12 Other                  11
 7 Abades     2008 03    PP                    363 PSOE                  229
 8 Abades     2011 11    PP                    353 PSOE                  146
 9 Abades     2015 12    PP                    276 PSOE                  153
10 Abades     2016 06    PP                    289 PSOE                  115
# ℹ 26,521 more rows

Now we see it in a graph to see the results more clearly for each year. When PSOE first

Code
second_party_year_psoe_summary <- results_psoe_winner |> 
  group_by(second_party, anno) |> 
  summarise(count = n(), .groups = "drop")

ggplot(second_party_year_psoe_summary, aes(x = factor(anno), y = count, fill = second_party)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("BNG" = "pink", "CS" = "orange",
                    "CIU" = "darkred", "EH-BILDU" = "darkgreen", "ERC" = "yellow", "Other" = "grey", "EAJ-PNV" = "lightblue", "PP" = "blue", "PODEMOS-IU" = "purple", "VOX" = "green")) +
  labs(title = "Distribution of second places by year when PSOE wins",
       x = "Year",
       y = "Number of municipalities",
       fill = "Party") +
  theme_minimal()

A stacked bar graph for each year is a good option since we can see the differences between the most voted parties after the PSOE and at the same time we see how it has changed over the years/elections.

We can check how PP has always been the most voted after PSOE, although since 2015 when Podemos-IU appears and the two-party system begins to break down and other parties start to appear. Especially in 2019 we see more fracture, since VOX seems to be the third most voted, followed by Podemos and Ciudadanos.

Animated GIF

When PP first

Code
second_party_year_pp_summary <- results_pp_winner |> 
  group_by(second_party, anno) |> 
  summarise(count = n(), .groups = "drop")

ggplot(second_party_year_pp_summary, aes(x = factor(anno), y = count, fill = second_party)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("BNG" = "pink", "CS" = "orange",
                    "CIU" = "darkred", "EH-BILDU" = "darkgreen", "ERC" = "yellow", "Other" = "grey", "EAJ-PNV" = "lightblue", "PP" = "blue", "PODEMOS-IU" = "purple", "VOX" = "green", "PSOE" = "red")) +
  labs(title = "Distribution of second places by year when PP wins",
       x = "Year",
       y = "Number of municipalities",
       fill = "Party") +
  theme_minimal()

Here we see much more diversity than when the PSOE was first, since people have been voting for a greater diversity of parties since 2008. This can be explained because the left is much more divided than the right and the alternative to the PP was not only PSOE but other parties. However, as before, since 2015 we see other parties appear that are voted in large proportion, such as Podemos and Ciudadanos, and in 2019 also VOX.

Question 3

Who benefits from low turnout?

First of all, what is a low turnout? It makes reference to the electoral districts where the electoral participation is low. We consider low turnout <0,6 based on the turnout tendency in Spain (We can not compare with other countries because in some like USA 60% is moderate/high and in other like Sweden 60% is very very low). So we compare participatory trend in Spain during democracy, where the average participation rate in general elections in Spain is 72.1%, so <70% would be moderate and lower than usual and <60% would be low. (20 Minutos, 2023)

Code
# we calculate participation and we classify it
turnout_election_data <- final_election_data|>
  mutate(turnout = (participacion_1 + participacion_2) / censo) |> 
  mutate(turnout_bin = ifelse(turnout < 0.6, "Low Turnout", "High Turnout"))

# function to analyze the effect of low participation:

analyze_low_turnout <-
  function(data, turnout_col = "turnout", ballots_col = "ballots", censo_col = "censo", party_col = "siglas") {
  # calculate vote share
  data <- data |>
    group_by(codigo_municipio) |>
    mutate(vote_share = !!sym(ballots_col) / !!sym(censo_col))
  
  # calculate turnout and bin it into low and high categories
  data <- data |>
    mutate(turnout_bin = ifelse(!!sym(turnout_col) < 0.6, 
                                "Low Turnout", 
                                "High Turnout"))
  
  # Summarize vote share by turnout category and party
  summary <- data |>
    group_by(!!sym(party_col), turnout_bin) |>
    summarise(avg_vote_share = mean(vote_share, na.rm = TRUE), .groups = "drop") |>
    arrange(desc(avg_vote_share))
  
  return(summary)
}
Code
# we apply the function

turnout_analysis <- analyze_low_turnout(turnout_election_data, 
                                        turnout_col = "turnout", 
                                        ballots_col = "ballots", 
                                        censo_col = "censo", 
                                        party_col = "siglas")
print(turnout_analysis)
# A tibble: 24 × 3
   siglas   turnout_bin  avg_vote_share
   <chr>    <chr>                 <dbl>
 1 PP       High Turnout          0.284
 2 CIU      High Turnout          0.260
 3 EAJ-PNV  High Turnout          0.248
 4 CIU      Low Turnout           0.248
 5 PP       Low Turnout           0.245
 6 PSOE     High Turnout          0.230
 7 ERC      High Turnout          0.171
 8 PSOE     Low Turnout           0.168
 9 EAJ-PNV  Low Turnout           0.163
10 EH-BILDU High Turnout          0.143
# ℹ 14 more rows

Visualization of the results:

Code
ggplot(turnout_analysis, aes(x = siglas, y = avg_vote_share, fill = turnout_bin)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Vote Share by Turnout Level and Party",
       x = "Party",
       y = "Average Vote Share (%)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

The parties that benefit from low turnout are: Podemos-IU, VOX and Other parties, although it’s not a big difference.

Question 4

How to analyze the relationship between census and vote? Is it true that certain parties win in rural areas?

In Spain is considered “rural” a population of <30000 inhabitants according to Ley 45/2007, de 13 de diciembre, «BOE» núm. 299, de 14 de diciembre de 2007, páginas 51339 a 51349 (11 págs.)

Code
# classify rural vs urban
election_data_areas <- final_election_data |> 
  mutate(area_type = case_when(
    censo < 30000 ~ "Rural",
    TRUE ~ "Urban"
  ))  |> 
  group_by(area_type, siglas)  |> 
  summarise(total_votes = sum(ballots, na.rm = TRUE))  |> 
  mutate(vote_share = total_votes / sum(total_votes))

# proportion of votes per party in each region
vote_share_table <- election_data_areas |> 
  group_by(siglas) |> 
  mutate(total_votes_party = sum(total_votes)) |> 
  ungroup() |> 
  mutate(vote_share_adjusted = total_votes / total_votes_party) |> 
  select(siglas, area_type, vote_share_adjusted) |> 
  pivot_wider(names_from = area_type, values_from = vote_share_adjusted) |> 
  mutate(Rural = Rural * 100, Urban = Urban * 100) |> 
  select(siglas, Rural, Urban)

vote_share_table
# A tibble: 12 × 3
   siglas     Rural Urban
   <chr>      <dbl> <dbl>
 1 BNG         65.6  34.4
 2 CIU         51.8  48.2
 3 CS          39.2  60.8
 4 EAJ-PNV     52.2  47.8
 5 EH-BILDU    64.6  35.4
 6 ERC         53.1  46.9
 7 MP          21.4  78.6
 8 Other       44.8  55.2
 9 PODEMOS-IU  41.4  58.6
10 PP          45.5  54.5
11 PSOE        48.5  51.5
12 VOX         43.2  56.8

We express the results in a plot:

Code
# convert table to long format for ggplot
vote_share_long <- vote_share_table |>  
  pivot_longer(cols = c(Rural, Urban), names_to = "area_type", values_to = "vote_share")

# plot
ggplot(vote_share_long, aes(x = siglas, y = vote_share, fill = area_type)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(values = c("Rural" = "#8FCB9B", "Urban" = "#4A90E2")) +
  labs(title = "Vote Share by Party and Area Type (100%)", 
       x = "Party", 
       y = "Vote Share (%)", 
       fill = "Area Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

We chose a stacked bar graph to see for each party how many of its voters are from the urban area and the rural area and we see the following: There are parties that stand out for having more votes from urban areas, such as Más País and Ciudadanos, perhaps explained by the policies that the have more focused on city problems (climate change, modernization, etc.).

On the other hand, there are parties with a higher percentage of voters from rural areas, such as BNG, EH-Bildu, CiU and ERC. This may be due to its focus on defending regional identity, autonomy and local development, topics that resonate especially in these areas. In addition, his opposition to national parties and his historical connection with rural communities reinforce his support.

Question 5

How to calibrate the error of the polls (remember that the polls are voting intentions at national level)?

Code
# first, let's compute the total votes for each party in each election
total_votes_by_party <- final_election_data |>
  group_by(anno, mes, party) |>
  summarize(total_ballots = sum(ballots, na.rm = TRUE), .groups = "drop") |> 
   group_by(anno, mes) |> 
   mutate(total_ballots_election = sum(total_ballots, na.rm = TRUE)) |> 
    ungroup() |> 
   mutate(real_percentage = total_ballots / total_ballots_election) |> 
  select(anno, mes, party, real_percentage) |> 
  mutate(mes = as.integer(mes)) # Convert mes to integer (there was a problem without it before)

# now, let's average survey data for each party for each election
survey_estimates <- final_survey_data |>
  filter(!is.na(estimated_voting)) |> 
  group_by(date_elec, party) |>
  summarize(mean_estimated_voting = mean(estimated_voting, na.rm = TRUE), .groups = "drop")|>
   mutate(anno = year(date_elec), mes = month(date_elec)) |> 
    select(anno, mes, party, mean_estimated_voting)

# now we'll merge the two datasets (election results and survey estimates)
# before, we change some party names so they fit
survey_estimates <- survey_estimates |> 
  mutate(party = case_when(
    party == "CIUDADANOS PARTIDO DE LA CIUDADANÍA" ~ "CIUDADANOS PARTIDO DE LA CIUDADANÍA",
    party == "PARTIDO POPULAR" ~ "PARTIDO POPULAR",
    party == "PARTIDO SOCIALISTA OBRERO ESPAÑOL" ~ "PARTIDO SOCIALISTA OBRERO ESPAÑOL",
    party == "UNIDAS PODEMOS-IU" ~ "PODEMOS-IU",
    party == "VOX" ~ "VOX",
        party == "EH-BILDU" ~ "EH BILDU",
    TRUE ~ party
  ))

# we merge
poll_error_data <- left_join(total_votes_by_party, survey_estimates, by = c("anno", "mes", "party"))

# calculate the absolute error for each survey for each party
poll_error_data <- poll_error_data |> 
  mutate(error_percentage = abs(real_percentage - (mean_estimated_voting/100)))

# Print the data with the error (see the table below)
print(poll_error_data)
# A tibble: 72 × 6
    anno   mes party      real_percentage mean_estimated_voting error_percentage
   <dbl> <dbl> <chr>                <dbl>                 <dbl>            <dbl>
 1  2008     3 BLOQUE NA…         0.00835                  1.08          0.00248
 2  2008     3 CIUDADANO…         0.00182                 NA            NA      
 3  2008     3 CONVERGÈN…         0.0309                   3.28          0.00196
 4  2008     3 EH BILDU           0.00250                 NA            NA      
 5  2008     3 ESQUERRA …         0.0116                   2.33          0.0118 
 6  2008     3 MÁS PAÍS           0                       NA            NA      
 7  2008     3 Other              0.127                    4.53          0.0812 
 8  2008     3 PARTIDO N…         0.0121                   1.79          0.00581
 9  2008     3 PARTIDO P…         0.406                   38.4           0.0218 
10  2008     3 PARTIDO S…         0.375                   42.2           0.0472 
# ℹ 62 more rows

The code calculates the polling error by comparing real election results with average survey estimates. It first computes the actual vote percentages (real_percentage) from election data by dividing party ballots by total ballots. Then, it calculates the average predicted vote percentages (mean_estimated_voting) from survey data. The datasets are merged, and the absolute error is computed as the difference between real percentages and polling predictions. The result is a tibble showing errors for each party in each election.

Example: In 2008, BNG had a real percentage of ~0.835% and a polling estimate of 1.08%, resulting in an error of ~0.00248.

Question 6

Which polling houses got it right the most and which ones deviated the most from the results?

Most Accurate Polling Houses:

Code
# Calculate real vote percentages from election data
real_votes <- final_election_data |>
  group_by(anno, mes, party) |>
  summarize(total_ballots = sum(ballots, na.rm = TRUE), .groups = "drop") |>
    group_by(anno, mes) |>
      mutate(total_ballots_election = sum(total_ballots, na.rm = TRUE)) |>
  ungroup() |>
  mutate(real_percentage = total_ballots / total_ballots_election) |>
  select(anno, mes, party, real_percentage) |>
    mutate(mes = as.integer(mes)) # Convert mes to integer


# join real vote percentages with survey data
# adjust party names in survey data to match election data before the join
final_survey_data_adjusted <- final_survey_data |>
    mutate(party = case_when(
        party == "CIUDADANOS PARTIDO DE LA CIUDADANÍA" ~ "CIUDADANOS PARTIDO DE LA CIUDADANÍA",
        party == "PARTIDO POPULAR" ~ "PARTIDO POPULAR",
        party == "PARTIDO SOCIALISTA OBRERO ESPAÑOL" ~ "PARTIDO SOCIALISTA OBRERO ESPAÑOL",
        party == "UNIDAS PODEMOS-IU" ~ "PODEMOS-IU",
        party == "VOX" ~ "VOX",
        party == "EH-BILDU" ~ "EH BILDU",
        TRUE ~ party
      )) |>
    mutate(mes = month(date_elec), anno= year(date_elec))


survey_with_real <- final_survey_data_adjusted |>
  left_join(real_votes, by = c("anno", "mes", "party"))
Code
# calculate the error for each pollster
pollster_error <- survey_with_real |>
  group_by(pollster) |>
  summarize(average_error = mean(abs(estimated_voting - (real_percentage * 100)), na.rm = TRUE), .groups = "drop")

# identify the pollsters with the smallest average error (most accurate)
best_pollsters <- pollster_error |>
  arrange(average_error)

# identify the pollsters with the biggest average error (least accurate)
worst_pollsters <- pollster_error |>
  arrange(desc(average_error))

print(head(best_pollsters))
# A tibble: 6 × 2
  pollster     average_error
  <chr>                <dbl>
1 OPINA                 2.61
2 METRA SEIS            2.70
3 VOX PÚBLICA           3.09
4 TOP POSITION          3.16
5 GALLUP                3.18
6 IBERCONSULTA          3.26

The Most Accurate Polling Houses were: OPINA, METRA SEIS and VOX PÚBLICA, with average errors of +/- 2-3% meaning that pollster’s estimates were, on average, 2-3 percentage points different from the actual results. This error refers to a general average over the years we have considered in our data set (2008-2019)

Less Accurate Polling Houses

Code
print(head(worst_pollsters))
# A tibble: 6 × 2
  pollster       average_error
  <chr>                  <dbl>
1 TÁBULA V                6.01
2 IKERFEL                 5.85
3 MYWORD                  5.81
4 SYM CONSULTING          5.65
5 DEMOMÉTRICA             5.37
6 CELESTE-TEL             5.33

And the worst were: TÁBULA V, IKERFEL and MYWORD where the pollster’s estimates were, approximately, 6 percentage points different from the actual results.

Original Question 1

Participation analysis: How does participation at the polls evolve over the years (2008-2019)?

Code
# prepare Election data
election_data_turnout <- final_election_data |>
  mutate(election_year = year(make_date(anno, mes))) # extract election year

# calculate Total census and Votes per year
turnout_by_year <- election_data_turnout |>
    group_by(election_year) |>
    summarise(
      total_census = sum(censo, na.rm = TRUE),  # calculate total census
      total_votes = sum(votos_candidaturas, na.rm = TRUE) + sum(votos_blancos, na.rm = TRUE), # calculate total valid votes
        .groups = "drop"
        )

# calculate Turnout Percentage
turnout_by_year <- turnout_by_year |>
    mutate(turnout_percentage = (total_votes / total_census) * 100) 

print(turnout_by_year)
# A tibble: 5 × 4
  election_year total_census total_votes turnout_percentage
          <dbl>        <dbl>       <dbl>              <dbl>
1          2008  15446544456 11562392088               74.9
2          2011  15639297480 11069819064               70.8
3          2015  15791474256 11436300336               72.4
4          2016  15776182752 10913638608               69.2
5          2019  31769422872 22890157128               72.1

Now we visualize the results:

Code
ggplot(turnout_by_year, aes(x = election_year, y = turnout_percentage)) +
  geom_line(color = "darkblue", linewidth = 1.2) +
  geom_point(color = "darkblue", size = 3) +
    geom_text(aes(label = sprintf("%.1f%%", turnout_percentage), # add text labels with percentage
                vjust = ifelse(election_year %in% c(2011, 2016), 1.5, -1)), # adjust vjust for 2011 and 2016
               size = 3, color = "darkblue") + 
  scale_x_continuous(breaks = unique(turnout_by_year$election_year)) +
  scale_y_continuous(limits = c(65, 80), breaks = seq(65, 80, 5), expand = c(0,0)) +
    labs(title = "Turnout Percentage in Spanish Congressional Elections (2008-2019)",
         x = "Election Year",
         y = "Turnout Percentage") +
  theme_minimal() +
    theme(panel.grid.major.x = element_blank(),
          panel.grid.minor.x = element_blank(),
         panel.grid.major.y = element_line(color = "gray", linewidth = 0.3),
          panel.grid.minor.y = element_blank())

In the 2008 elections, participation reached 75%, which is high (above the Spanish average) and similar to that of the first democratic elections in Spain (1977), which reflected strong citizen commitment and consolidated democracy. However, in 2011 participation fell to 70.8%, which can be explained in part by the impact of the economic, social and political crisis that affected the country, generating distrust and uncertainty among citizens. In 2015, turnout rose almost 2 percentage points, probably driven by the emergence of new political actors such as Podemos, which offered an alternative to the traditional two-party system, sparking the interest and hope of many voters. However, in 2016 participation fell again, reaching the lowest level since 2008 (69.2%), reflecting possible disenchantment or political fatigue. In 2019, participation rebounded again, but did not reach the high levels of 2008, suggesting a partial recovery of electoral interest, although still far from the participation peaks observed in previous times.

Original Question 2

Which parties received the most votes in each province from 2008 to the 2019 elections?

(Important: the number of votes is not equal to the final result translated into seats)

Code
# we extract the party with the most votes in each province
resultado_max_votos <- final_election_data |>
  filter(anno %in% c(2008, 2011, 2015, 2016, 2019)) |>
  group_by(anno, codigo_provincia, siglas) |>
  summarise(total_votos = sum(ballots, na.rm = TRUE)) |>
  slice_max(order_by = total_votos, n = 1)

print(resultado_max_votos)
# A tibble: 260 × 4
# Groups:   anno, codigo_provincia [260]
    anno codigo_provincia siglas total_votos
   <dbl> <chr>            <chr>        <dbl>
 1  2008 01               PSOE         69179
 2  2008 02               PP          113317
 3  2008 03               PP          488333
 4  2008 04               PP          161366
 5  2008 05               PP           66747
 6  2008 06               PSOE        222269
 7  2008 07               PSOE        207644
 8  2008 08               Other      1514486
 9  2008 09               PP          116241
10  2008 10               PSOE        139487
# ℹ 250 more rows

In order to see the results clearly we need a base map which we take from here: https://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=CAANE , especifically Cartografía Base de España del Atlas Nacional de España 1:10.000.000 SHAPE 2006 - 2024

Code
provincias <- st_read("SIANE_CARTO_BASE_S_10M/vigente/se89_10_admin_prov_a_x.shp")
Reading layer `se89_10_admin_prov_a_x' from data source 
  `C:\Users\laura\OneDrive\Escritorio\Máster\Semester 1\Data Programming\GROUP TASK\group\SIANE_CARTO_BASE_S_10M\vigente\se89_10_admin_prov_a_x.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 50 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -9.298027 ymin: 35.25069 xmax: 4.314508 ymax: 43.77594
Geodetic CRS:  ETRS89
Code
mapa_con_votos <- provincias |>
  left_join(resultado_max_votos, by = c("id_prov" = "codigo_provincia"))

# colors for parties
siglas_colors <- c(
  "BNG" = "skyblue", 
  "CS" = "#EB6109",
  "CIU" = "#F7D117",
  "EH BILDU" = "lightgreen",
  "ERC" = "#FFB000",
  "MP" = "#00B1E0",
  "Other" = "#D3D3D3",  
  "EAJ-PNV" = "#228B22",
  "PP" = "#1D84CE",
  "PSOE" = "#FF0000",
  "PODEMOS-IU" = "#6A097D",
  "VOX" = "#5BC035"
)

# order in the legend
mapa_con_votos$siglas <- factor(mapa_con_votos$siglas, levels = c("PP", "PSOE", "PODEMOS-IU", "EAJ-PNV", "Other"))
mapa_con_votos$siglas[is.na(mapa_con_votos$siglas)] <- "Other"

# final map
ggplot(mapa_con_votos) +
  geom_sf(aes(fill = siglas)) +  
  scale_fill_manual(values = siglas_colors) +  
  facet_wrap(~anno) +  
  theme_minimal() +  
  labs(title = "Parties with the most votes per province in general elections in Spain (2008-2019)",
       fill = "Party") +  
  theme(
    plot.title = element_text(size = 14, hjust = 0.5), 
    axis.text = element_blank(), 
    axis.ticks = element_blank(), 
    axis.title = element_blank(), 
    panel.grid = element_blank(), 
    legend.position = c(0.95, 0.05), 
    legend.justification = c(1, 0), 
    legend.direction = "vertical" 
  )

The parties with the most votes in almost the last two decades have been PP and PSOE, the great parties of the two-party system. In 2008, there was more balance between the parties voted in the provinces, but finally PSOE won. That explains the result in the following elections (2011) where there was a strong reaction from the right causing the PP to win. The PP also consolidated itself in the 2016 elections. It should be highlighted that in the 2015 elections no party obtained a majority to govern and no agreements were reached, which is why elections were repeated the following year. In 2019, the left reemerged and the PSOE won again** after the motion of censure against Rajoy (PP) the previous year (2018).

Furthermore, we see that there is no consistency throughout the whole peninsula, since we see over the years that in the southwest (part of Andalusia) people generally vote more for the PSOE, part of Castilla y León vote more PP, in Catalonia, other parties and in the Basque Country there’s more variety: Podemos, PSOE and PNV

Animated GIF

Original Question 3

Analyze the proportionality of the vote in the 2019 November elections.

What we will do is study the proportion of votes obtained by each party at a national level, and then compare that to the proportion of “escaños” that each of those parties finally obtained. This will allow us to see which parties obtain a bigger proportion “escaños” compared to their proportion of votes and viceversa.

Code
# First, let´s filter the votes by party so that we obtain only those for the elections of Novembers, 2019
votes_2019 <- votes_by_party_all |> 
  filter(anno == 2019, mes == 11)

# Now, let´s obtain de total of valid votes obtained in those elections in general, for all of the parties
national_total_2019 <- sum(votes_2019$total_votes)

# And, finally, let´s se the percentage of votes that each party obtained in a national level

votes_2019 <- votes_2019 |> 
  group_by(siglas) |> 
  summarise(votes_by_party_2019 = sum(total_votes)) |> 
  mutate(percentage_votes = (votes_by_party_2019 / national_total_2019) * 100)

votes_2019 <- votes_2019 |> rename(party = siglas)

votes_2019
# A tibble: 12 × 3
   party      votes_by_party_2019 percentage_votes
   <chr>                    <dbl>            <dbl>
 1 BNG                     119597            0.501
 2 CIU                          0            0    
 3 CS                     1637341            6.86 
 4 EAJ-PNV                 377502            1.58 
 5 EH-BILDU                276535            1.16 
 6 ERC                     869786            3.65 
 7 MP                      370222            1.55 
 8 Other                  3036267           12.7  
 9 PODEMOS-IU             2550852           10.7  
10 PP                     5021622           21.0  
11 PSOE                   5961901           25.0  
12 VOX                    3640377           15.3  
Code
# Now, let´s study the proportion of "escaños" obtained by each party. Our Congress has 350 "escaños":
escanos_total <- 350

party <- c("BNG", "CIU", "CS", "EAJ-PNV", "EH-BILDU", "ERC", "MP", "PODEMOS-IU", "PP", "PSOE", "VOX", "Other")

escanos_asigned <- c(1, 0, 10, 6, 5, 13, 3, 35, 89, 120, 26, 42)

percentage_escanos <- (escanos_asigned / escanos_total) * 100

# Again, let´s see the percentage of "escaños" that each of the parties obtained

results_2019_nov <- data.frame(
  party = party,
  escanos_asigned = escanos_asigned,
  percentage_escanos = percentage_escanos
)

print(results_2019_nov)
        party escanos_asigned percentage_escanos
1         BNG               1          0.2857143
2         CIU               0          0.0000000
3          CS              10          2.8571429
4     EAJ-PNV               6          1.7142857
5    EH-BILDU               5          1.4285714
6         ERC              13          3.7142857
7          MP               3          0.8571429
8  PODEMOS-IU              35         10.0000000
9          PP              89         25.4285714
10       PSOE             120         34.2857143
11        VOX              26          7.4285714
12      Other              42         12.0000000
Code
# Now, let´s compare the percentage of votes and the percentage of "escaños" by party
total_votes_2019 <- sum(votes_2019$votes_by_party_2019)

votes_2019$percentage_votes <- (votes_2019$votes_by_party_2019 / total_votes_2019) * 100

comparison <- merge(results_2019_nov, votes_2019[, c("party", "percentage_votes")], by = "party")

print(comparison)
        party escanos_asigned percentage_escanos percentage_votes
1         BNG               1          0.2857143        0.5012027
2         CIU               0          0.0000000        0.0000000
3          CS              10          2.8571429        6.8617084
4     EAJ-PNV               6          1.7142857        1.5820215
5    EH-BILDU               5          1.4285714        1.1588927
6         ERC              13          3.7142857        3.6450672
7          MP               3          0.8571429        1.5515127
8       Other              42         12.0000000       12.7242760
9  PODEMOS-IU              35         10.0000000       10.6900167
10         PP              89         25.4285714       21.0444287
11       PSOE             120         34.2857143       24.9849153
12        VOX              26          7.4285714       15.2559580
Code
# And, finally, let´s take a look at a plot that will allow us to visualize this comparison

table_long <- pivot_longer(
  comparison,
  cols = c(percentage_escanos, percentage_votes),
  names_to = "type",
  values_to = "percentage"
)

table_long$party <- fct_reorder(table_long$party, table_long$percentage, .fun = max, .desc = TRUE)

ggplot(table_long, aes(x = party, y = percentage, fill = type)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Comparison of percentage of escaños vs votes by party",
    x = "Party",
    y = "Percentage (%)",
    fill = "Type"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(hjust = 0.5)
  )

As we can see, bigger parties such as PP and PSOE obtained a proportion of “escaños” bigger than their proportion of votes at a national level. Smaller parties tend to work the other way around; and nationalist parties, such as ERC, Bildu… they tend to be almost perfetly proportional, obtaining the same proportion of “escaños” than their proportion of votes. This is due to the fact that in Spain we use as electoral districts the provinces, and that our electoral formula is a proportional one. Basically, this distribution of the districts and the choice of using a proportional formula leads to the fact that, in provinces with smaller populations, each vote has “more value” than in more populated provinces. For example, a vote in Soria is more valuable than one in Madrid. This leads to the fact that the biggest parties, PP and PSOE, tend to obtain a bigger proportion of “escaños” than their real proportion of votes.

Animated GIF

Original Question 4:

Protest vote analysis (2008-2019)

Code
# 1. Calculate Total Votes, Blank and Null Votes per Year
votes_by_year <- election_data_turnout |>
  group_by(election_year) |>
  summarise(
    total_votes = sum(votos_candidaturas, na.rm = TRUE) + sum(votos_blancos, na.rm = TRUE) + sum(votos_nulos, na.rm = TRUE),
    total_blank_votes = sum(votos_blancos, na.rm = TRUE),
    total_null_votes = sum(votos_nulos, na.rm = TRUE),
        .groups = "drop"
  )

# 2. Calculate Percentages of Blank and Null Votes
votes_by_year <- votes_by_year |>
  mutate(
    blank_percentage = (total_blank_votes / total_votes) * 100,
    null_percentage = (total_null_votes / total_votes) * 100
  )

# 3. Create Blank Votes Plot
plot_blancos <- ggplot(votes_by_year, aes(x = election_year, y = blank_percentage)) +
  geom_line(color = "darkgreen", linewidth = 1.2) +
  geom_point(color = "darkgreen", size = 3) +
  geom_text(aes(label = sprintf("%.1f%%", blank_percentage),
                 vjust = ifelse(election_year %in% c(2011, 2016), 1.5, -1)),
                 size = 3, color = "darkgreen") + 
  scale_x_continuous(breaks = unique(votes_by_year$election_year)) +
  scale_y_continuous(limits = c(0, 3), breaks = seq(0, 3, 1), expand = c(0,0)) +
  labs(title = "Blank Votes Percentage",
       x = "Election Year",
       y = "Percentage") +
  theme_minimal() +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(color = "gray", linewidth = 0.3),
        panel.grid.minor.y = element_blank())
Code
# 4. Create Null Votes Plot
plot_nulos <- ggplot(votes_by_year, aes(x = election_year, y = null_percentage)) +
  geom_line(color = "darkred", linewidth = 1.2) +
  geom_point(color = "darkred", size = 3) +
  geom_text(aes(label = sprintf("%.1f%%", null_percentage),
                 vjust = ifelse(election_year %in% c(2008, 2016), 1.5, -1)),
                 size = 3, color = "darkred") + 
  scale_x_continuous(breaks = unique(votes_by_year$election_year)) +
    scale_y_continuous(limits = c(0, 3), breaks = seq(0, 3, 1), expand = c(0,0)) +
  labs(title = "Null Votes Percentage",
       x = "Election Year",
       y = "Percentage") +
  theme_minimal() +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(color = "gray", linewidth = 0.3),
        panel.grid.minor.y = element_blank())


# 5. Combine and display plots
combined_plot <- plot_blancos + plot_nulos + plot_layout(ncol = 2) # Arrange vertically


print(combined_plot) # Display the combined plot

2011 is the year with the most blank and null votes percentages, which could be attributed to the economic crisis and the subsequent discontent with politics, which we can attribute as well to the rise of new parties. The peaks in blank and null votes in 2011 align with widespread discontent during the economic crisis and the Movimiento 15M, which protested political corruption and austerity. This reflects frustration with traditional parties (PSOE and PP) and a lack of viable alternatives at the time. The decline in blank and null votes in 2015–2016 corresponds to the rise of Podemos and Ciudadanos, which offered new political options inspired by 15M ideals. We see the numbers for both decrease again for the 2015 and 2016 election, although it slightly raises again in 2019.

Thank you!

Do you have any questions?

Animated GIF